Bucketization based Flow Classification Algorithm for Data Stream Privacy Mining
نویسندگان
چکیده
In recent years, data mining plays a major role in maintaining the huge volume of data from which it can derive the useful information. With the huge number of formation of data, the data wants to be lectured in a limit to the charge of growth. But it is complex to get over the set of meaningful information from the continuous set of data. Data-stream mining is a method which can discover important information from a huge contract of prehistoric data. For identification of useful information, the classification of continuous data streams is done. Current approaches in classifying the data streams are processed using supervised learning algorithms, which can be qualified with tagged data. Usually, manual classification of data is both expensive and time consuming. As a result, where massive amount of data emerge at a high speed, tagged data might be very sparse. Therefore, only a restricted amount of training data might be accessible for constructing the classi?cation models, tend to badly trained classi?ers. To overcome the issue, in this work, a novel technique is presented to build a classification set having both unlabeled and a small amount of labeled instances. This model is built by using the Flow Classification Algorithm (FCA). The FC algorithm is able to judge internally on set of marked data. Before classification, the correlation set of attributes in the each record set are grouped using bucketization technique. The superiority of models updated from them is enough for utilization of unlabeled records, or whether more set of labeled records are needed for
منابع مشابه
Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique
Privacy Preserving is one of the significant methods in data mining to hide the sensitive information. Anonymization techniques like generalization and bucketization have been used for privacy preserving. The main problem with generalization is it is not applicable for high-dimensional data and bucketization technique does not avoid membership disclosure. Slicing is one of the novel techniques ...
متن کاملA Heuristic Approach to Preserve Privacy in Stream Data with Classification
Data stream Mining is new era in data mining field. Numerous algorithms are used to extract knowledge and classify stream data. Data stream mining gives birth to a problem threat of data privacy. Traditional algorithms are not appropriate for stream data due to large scale. To build classification model for large scale also required some time constraints which is not fulfilled by traditional al...
متن کاملA Novel Approach for Data Publishing in Mining
In recent years, advances in technology have lead to an increase in the capability to store and record personal data about consumers and individuals. This has guide to concerns that the personal data may be misused for a variety of purposes. In order to improve this number of techniques has recently been proposed to perform the data mining tasks in a privacy preserving way. These techniques des...
متن کاملPrivacy Preserving Data Stream Classification Using Data Perturbation Techniques
Data stream can be conceived as a continuous and changing sequence of data that continuously arrive at a system to store or process. Examples of data streams include computer network traffic, phone conversations, web searches and sensor data etc. These data sets need to be analyzed for identifying trends and patterns, which help us in isolating anomalies and predicting future behavior. However,...
متن کاملSemi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices
In this paper a homomorphic privacy preserving association rule mining algorithm is proposed which can be deployed in resource constrained devices (RCD). Privacy preserved exchange of counts of itemsets among distributed mining sites is a vital part in association rule mining process. Existing cryptography based privacy preserving solutions consume lot of computation due to complex mathematical...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013